Download the workshop files
Install any required libraries
October 2, 2017
Download the workshop files
Install any required libraries
Who are you?
Why are you here?
There are a number of different R packages for working with geographic data and creating maps, including:
You can also use the R base graphics function plot() to make quick plots by plotting longitude or x coordinates on the X axis and latitude or y coordinates on the Y axis.
But it's quick & sometimes dirty, especially when your data covers a large geographic extent. Can you guess why?
Distances between degrees of longitude decrease with distance from the equator.
So, plotting latitude and longitude across large areas on a 2D cartesian plot will distort distances and area.
This tutorial will show you how to create maps using the ggplot2 and ggmaps packages.
We will focus on point data.
gplot2 is perhaps the most widely used R package for plotting data. While not specifically for geographic data, you can use its expressive and powerful syntax to create beautiful maps.
People who use ggplot can leverage that knowledge. However, newcomers to the package may find it non-intuitive. Fortunately, there are many online tutorials and cheat-sheets for using ggplot. Some starting points are listed at the end of this tutorial.
ggmap extends ggplot2 with functionality for incorporating map data from popular online services like Google Maps data and OpenStreetMaps. It also has other functions for geocoding, computing distances and more.
Use ?ggmap to access the help page for ggmap and get the details on its functionality.
A great article by the authors of ggmap, David Kahle and Hadley Wickham, that also serves as a tutorial, can be found here: https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf.
Let's start with a brief review of geographic data fundamentals
Geographic data are observations about locations on or near the surface of the Earth. For example:
![]()
That tells you some useful information about this place but it doesn't tell you where it is on the surface of the Earth.
Geospatial data are geographic data that represent location geometrically with coordinates, such as latitude and longitude.
These coordinates are referenced to specific locations on the Earth using a a coordinate reference system.
![]()

Can you tell which coordinate value is latitude and which one is longitude?
46.130479, -117.134167
46.130479, -117.134167
Latitude is zero at the equator and decreases to -90 at the South Pole and increases to + 90 at the North pole.
Longitude is zero at the Prime Meridian (near London, England) and increases to -180 as you move west and to + 180 as you move east.
So, since latitude is never > 90 or less than < -90 the order must be latitude, longitude.
Spatial data is a more generic term that is not just for geographic data.
Spatial data are powerful because software can dynamically determine spatial metrics like area and length, characteristics like distance and direction, and relationships like inside and intersects from these data.
We will start by mapping some October AirBNB rental data that was downloaded from: Inside Airbnb on October 2, 2017.
These data are provided for instructional use only!
We will look at two bedroom Airbnb rentals.
# Set your working directory to the tutorial folder
setwd("~/Documents/Dlab/dlab_workshops/Maps_with_ggplot2_and_ggmap_tutorial")
# Load the data
sf_apts <- read.csv("data/sf_airbnb_2bds.csv")
str(sf_apts)
## 'data.frame': 1206 obs. of 14 variables: ## $ id : int 91957 18139835 172943 2309140 2559297 149108 10832295 15134083 283235 8013423 ... ## $ name : Factor w/ 1203 levels " Victorian Suite in Inner Mission",..: 1091 267 917 1128 479 904 349 696 1064 976 ... ## $ latitude : num 37.8 37.8 37.8 37.8 37.8 ... ## $ longitude : num -122 -122 -122 -122 -122 ... ## $ property_type : Factor w/ 4 levels "Apartment","Condominium",..: 1 1 3 1 1 3 2 1 1 1 ... ## $ room_type : Factor w/ 1 level "Entire home/apt": 1 1 1 1 1 1 1 1 1 1 ... ## $ accommodates : int 6 4 3 4 4 3 6 4 4 4 ... ## $ bathrooms : num 1.5 1 1 1 1 1 1 1 1 2 ... ## $ bedrooms : int 2 2 2 2 2 2 2 2 2 2 ... ## $ beds : int 4 2 2 2 2 2 2 2 2 2 ... ## $ price : int 300 275 279 395 235 300 450 349 199 235 ... ## $ review_scores_rating: int 94 100 100 97 86 100 100 92 96 93 ... ## $ neighbourhood : Factor w/ 52 levels "Alamo Square",..: 7 52 7 52 52 7 52 26 52 20 ... ## $ listing_url : Factor w/ 1206 levels "https://www.airbnb.com/rooms/1003756",..: 1156 490 446 666 689 293 55 311 711 1073 ...
options(width = 80) head(sf_apts[,1:4])
## id name latitude longitude ## 1 91957 Sunny SF Flat with 5*s 37.76194 -122.4493 ## 2 18139835 Brand New Apartment in the Heart of NOPA! 37.77568 -122.4453 ## 3 172943 Redwood Cottage on Ashbury Heights 37.76022 -122.4495 ## 4 2309140 The Lady Di 37.77524 -122.4394 ## 5 2559297 Entire 2BR w/pkg centrally-located! 37.77507 -122.4409 ## 6 149108 QUIET LUXURY COLE VALLEY PARKING 37.76101 -122.4507
head(sf_apts[,5:9])
## property_type room_type accommodates bathrooms bedrooms ## 1 Apartment Entire home/apt 6 1.5 2 ## 2 Apartment Entire home/apt 4 1.0 2 ## 3 House Entire home/apt 3 1.0 2 ## 4 Apartment Entire home/apt 4 1.0 2 ## 5 Apartment Entire home/apt 4 1.0 2 ## 6 House Entire home/apt 3 1.0 2
head(sf_apts[,10:13])
## beds price review_scores_rating neighbourhood ## 1 4 300 94 Cole Valley ## 2 2 275 100 Western Addition/NOPA ## 3 2 279 100 Cole Valley ## 4 2 395 97 Western Addition/NOPA ## 5 2 235 86 Western Addition/NOPA ## 6 2 300 100 Cole Valley
Basic structure of a ggplot2 command begins with the call to the function ggplot().
In it's simplest form, a ggplot map requires:
ggplot(data=the_data_frame, aes(x = x_coord, y=y_coord)) + geom_point()
Note:argument names for data=, x= and y= are not required if they are provided in the order specificed by the function.
You can also use this syntax to create a map with ggplot:
ggplot() + geom_point(aes(x = x_coord, y=y_coord), data=the_data_frame)
The benefit of this syntax is that it makes it easy to swap out ggmap() and ggplot() function calls.
It makes it easier to more map data with multiple geometry calls.
Use both syntaxes to create a simple map of sf_airbnb points.
data=library(ggplot2) ggplot(data=sf_apts, aes(x=longitude, y=latitude)) + geom_point() ggplot(sf_apts, aes(longitude, latitude)) + geom_point() # ggplot() + geom_point(data=sf_apts, aes(x=longitude, y=latitude)) ggplot() + geom_point(aes(longitude,latitude), sf_apts )
How might we improve it?
ggplot2 geometry typesggplot2 can plot a number of geometry types.
To see them all, enter ?geom_ and hit the tab key. Move your mouse down to get the short description of the geometry.

View the help page by entering ?geom_point.
Question: do any of the examples on the help page create a map of geographic data?
Use the geom_point options to
Notes
These arguments do not go inside the aes() function unless they are mapped to a data column
You can see some of the possible values in the help pages
Or see the Quick-R website's page for graphical parameters.
ggplot() +
geom_point(data=sf_apts, aes(longitude,latitude),
colour="red", size=4, alpha=0.25)
Does this style add any value?
ggplot() +
geom_point(data=sf_apts, aes(longitude,latitude),
colour="red", size=4, alpha=0.25) +
theme_void() +
theme(plot.title = element_text(hjust = 0.5)) +
ggtitle("Airbnb 2 Bedroom Rentals, San Francisco")
Why?
Let's use the ggmap package to add a basemap to our map.
We will then display our Airbnb locations on top of the basemap.
A basemap is a reference map on top of which your primary map data is overlaid.
First, load the library and take a look at the help page
library(ggmap) ?ggmap
You will see from the ggmap help page that the main function is get_map. Take a look at ?get_map help page.

get_map allows you to retrieve map tiles from an online map service like Google Maps and use them in your map. Here are the steps:
get_map() with a location name or coordinates to fetch map tiles from an online service.ggmap() to display the map.Let's see what we get for San Francisco with the default settings.
sfmap <- get_map(location="San Francisco") ggmap(sfmap)
By default, ggmap automatically calculates a default zoom level based on your location and or data. As stated in the ggmap help page, zoom levels are integer values that range from 3 (continent) to 18 - 21 (local area) and the default setting is 10 (city).
Set the
zoom=parameter to a number that will give us an extent more appropriate for our Airbnb data.
sfmap <- get_map(location="San Francisco", zoom=12) ggmap(sfmap)
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Francisco&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Francisco&sensor=false
Let's overlay the Airbnb points on the basemap.
One way to do this is with the following syntax:
ggmap(<your map object>) + geom_point(<your point data options>)
For example:
ggmap(sfmap) + geom_point(data=sf_apts, aes(longitude, latitude))
Try it!
# Don't recreate the basemap if you haven't changed it! # sfmap <- get_map(location="San Francisco", zoom=12) ggmap(sfmap) + geom_point(data=sf_apts, aes(longitude, latitude))
Redo the previous map with some of the symbology changes we made before:
# Don't recreate the basemap if you haven't changed it!
# sfmap <- get_map(location="San Francisco", zoom=12)
airbnb_map <- ggmap(sfmap) + geom_point(data=sf_apts, aes(longitude, latitude),
colour="red", size=3, alpha=0.35) +
theme_void() +
theme(plot.title = element_text(hjust = 0.5)) +
ggtitle("Airbnb 2 Bedroom Rentals, San Francisco")
airbnb_map
Change the basemap to stamen toner. See ?get_map for help.
Hint: you need to add two arguments in the get_map() function
# Don't recreate the basemap if you haven't changed it!
sfmap <- get_map(location="San Francisco", zoom=12,
source = "stamen", maptype="toner")
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Francisco&zoom=12&size=640x640&scale=2&maptype=terrain&sensor=false
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Francisco&sensor=false
## Map from URL : http://tile.stamen.com/toner/12/653/1581.png
## Map from URL : http://tile.stamen.com/toner/12/654/1581.png
## Map from URL : http://tile.stamen.com/toner/12/655/1581.png
## Map from URL : http://tile.stamen.com/toner/12/656/1581.png
## Map from URL : http://tile.stamen.com/toner/12/653/1582.png
## Map from URL : http://tile.stamen.com/toner/12/654/1582.png
## Map from URL : http://tile.stamen.com/toner/12/655/1582.png
## Map from URL : http://tile.stamen.com/toner/12/656/1582.png
## Map from URL : http://tile.stamen.com/toner/12/653/1583.png
## Map from URL : http://tile.stamen.com/toner/12/654/1583.png
## Map from URL : http://tile.stamen.com/toner/12/655/1583.png
## Map from URL : http://tile.stamen.com/toner/12/656/1583.png
## Map from URL : http://tile.stamen.com/toner/12/653/1584.png
## Map from URL : http://tile.stamen.com/toner/12/654/1584.png
## Map from URL : http://tile.stamen.com/toner/12/655/1584.png
## Map from URL : http://tile.stamen.com/toner/12/656/1584.png
mymap<- ggmap(sfmap) + geom_point(data=sf_apts, aes(longitude, latitude), colour="red", size=3, alpha=0.35) +
theme_void() +
theme(plot.title = element_text(hjust = 0.5)) +
ggtitle("Airbnb 2 Bedroom Rentals, San Francisco")
mymap
We can overlay one or more geometries on top of our map by chaining geometry functions like +geom_point(...)
Let's add a point for the civic center and label it "Civic Center".
+geom_point(aes(x,y), size=5, color="black")+geom_textmymap <- ggmap(sfmap) +
geom_point(data=sf_apts, aes(longitude, latitude), colour="red",
size=4, alpha=0.35) +
geom_point(aes(-122.419900, 37.776154), size=5, shape=15) +
geom_text(aes(-122.419900, 37.776154 - 0.01), label = "Civic Center") +
theme_void() +
theme(plot.title = element_text(hjust = 0.5)) +
ggtitle("Airbnb 2 Bedroom Rentals, San Francisco")
Data driven maps use data values to determine map symbology.
For example, we can set the color of the points by the values in one of the columns of our sf_apts data frame.
What attributes of the airbnb rentals do you think might be interesting to map?
We can map categories by color by moving the color parameter inside the aes() and setting the value to a column name.
For example:
ggmap(sfmap) + geom_point(data=sf_apts, aes(longitude, latitude, colour=property_type, size=2, alpha=1)
Try it!
ggmap(sfmap) +
geom_point(data=sf_apts, aes(longitude, latitude, colour=property_type),
size=2, alpha=1)
Alternatively, you can create a map for each category by faceting the data.
This works best if you column has few unique categories.
Note, you must use the baselayer parameter with ggmap to add facets!
mymap <- ggmap(sfmap, base_layer=ggplot(data=sf_apts, aes(longitude, latitude))) + geom_point(col="red",size=1, alpha=1) + facet_wrap(~property_type)
mymap
For numerical data we can set the map symbology based on the data values.
We can vary color or size by the data values.
Let's try that with the rental price data.
It's a good idea to take a look at the distribution of the data values before mapping.
hist(sf_apts$price)
It seems as though most of the 2 bedroom rentals are below $1,000 (per day).
Let's subset the data to reduce the impact of outliers on our maps (this is one approach).
sf_apts1k <- sf_apts[sf_apts$price < 1000,] mymap <- ggmap(sfmap) + geom_point(data=sf_apts1k, aes(x=longitude, y=latitude, color=price))
mymap
Our zoom level is a bit too low (12) to see our data well. Let's zoom in to level 13 and recenter the map on our data.
See if you can follow along with the code below - what's it doing?
sfmap13 <- get_map(location=c(mean(sf_apts1k$longitude),
mean(sf_apts1k$latitude)), zoom=13,
source = "stamen", maptype="toner")
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=37.764575,-122.432831&zoom=13&size=640x640&scale=2&maptype=terrain&sensor=false
## Map from URL : http://tile.stamen.com/toner/13/1308/3165.png
## Map from URL : http://tile.stamen.com/toner/13/1309/3165.png
## Map from URL : http://tile.stamen.com/toner/13/1310/3165.png
## Map from URL : http://tile.stamen.com/toner/13/1311/3165.png
## Map from URL : http://tile.stamen.com/toner/13/1308/3166.png
## Map from URL : http://tile.stamen.com/toner/13/1309/3166.png
## Map from URL : http://tile.stamen.com/toner/13/1310/3166.png
## Map from URL : http://tile.stamen.com/toner/13/1311/3166.png
## Map from URL : http://tile.stamen.com/toner/13/1308/3167.png
## Map from URL : http://tile.stamen.com/toner/13/1309/3167.png
## Map from URL : http://tile.stamen.com/toner/13/1310/3167.png
## Map from URL : http://tile.stamen.com/toner/13/1311/3167.png
Does the new zoom level improve the map?
ggmap(sfmap13) + geom_point(data=sf_apts1k, aes(x=longitude, y=latitude, color=price))
## Warning: Removed 111 rows containing missing values (geom_point).
We can make a custom map extent by identifying the bounding box of our data.
First, we identify the bounding box from the min/max lat and lon values
Then, we use those to create a new basemap.
bbox <- c(min(sf_apts1k$longitude), min(sf_apts1k$latitude),
max(sf_apts1k$longitude), max(sf_apts1k$latitude))
sfmap_bbox <- get_map(location = bbox, source = "stamen",
maptype = "toner-lite")
## Map from URL : http://tile.stamen.com/toner-lite/13/1308/3165.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1309/3165.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1310/3165.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1311/3165.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1308/3166.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1309/3166.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1310/3166.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1311/3166.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1308/3167.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1309/3167.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1310/3167.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1311/3167.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1308/3168.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1309/3168.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1310/3168.png
## Map from URL : http://tile.stamen.com/toner-lite/13/1311/3168.png
ggmap(sfmap_bbox) + geom_point(data=sf_apts1k, aes(x=longitude, y=latitude, color=price))
Because my points are many and small and my colors are all in a narrow range of blues, it's hard to take much away from this map about the rental prices.
Let's see if a different color scheme can improve the map.
The R package RColorBrewer is often used to select color palettes. Read the package help for more info.
You can use the display.brewer function to see the colors in a specific named palette.
For example:
library(RColorBrewer) # ?ColorBrewer # access help pages display.brewer.pal(5,"Set3") # qualitative color palette with 5 colors
Qualitative - complementary colors, e.g. pastels, to emphasize different categories
Sequential - a range of different shades of the same color (hue) to imply higher to lower ranks or values
Divergent - two squential color palettes with a light or grey color in the middle; used to emphasize outliers

display.brewer.all(type="qual") display.brewer.all(type="seq") display.brewer.all(type="div")
Let's see how a YlOrRd palette works with this data.
Here we use scale_fill_distiller to map the data to to colors.
mymap <- ggmap(sfmap_bbox) +
geom_point(data=sf_apts1k, aes(x=longitude, y=latitude, color=price)) +
scale_color_distiller(palette="YlOrRd", direction=1)
# ?scale_color_distiller
# The distiller scales extends brewer to continuous scales by smoothly interpolate 6 colours
#from any palette to a continuous scale.
mymap
When you are mapping a lot of overlapping points you can think about what you would like to highlight in the map, for example low versus high data values.
You data will display on the map in the order it appears in the data frame. So if you want all the high value points to display on top you can sort the data values ascending. Try it with this data and recreate the previos map.
# sort data frame by price
sf_apts1k <- sf_apts1k[order(sf_apts1k$price),]
mymap <- ggmap(sfmap_bbox) +
geom_point(data=sf_apts1k, aes(x=longitude, y=latitude, color=price)) +
scale_color_distiller(palette="YlOrRd", direction=1)
mymap
Another extremely important method for improving the cartographic display of continuous data is data classification. While unclassified maps are great for exploring trends and outliers, they make it very hard to interpret specific data values.
You can use a data classification method to reduce the complexity in your mapped data by classifying continuous values into N (typically 5) bins. Common methods include:
equal interval: classes the data into bins of equal data ranges, e.g. 10-20, 20-30, 30-40.
quantile: classes the data into bins with an equal number of data observations. This is the default for most mapping software.
jenks/natural breaks: classes data into bins that minmizie within group variance and maximize between group variance.
There are many ways to do this in R and with ggplot and ggpmap. We show one below that applies the scale_color_distiller function to the ggmap object. Read the function documentation for more information.
library(classInt) mymap <- ggmap(sfmap_bbox) + geom_point(data=sf_apts1k, aes(x=longitude, y=latitude, color=price), size=2, alpha=1) + scale_color_distiller(name="price",palette = "Spectral", breaks = classIntervals(sf_apts1k$price, n=6, style="quantile")$brks )
mymap
Recreate the previous map using the equal interval and Jenks classification methods.
To do this set the style= argument in the scale_color_distiller function to equal or jenks, respectively.
Bonus: set the guide= argument to get a legend instead of a colorbar.
Thus, far we have worked with point data.
Other common types of geospatial data include lines, polygons and grids.
Points, lines and polygons are classified as Vector Data while grid cells are Raster Data
Vector data maps with points and polygons are quite common, with line data less so.
Raster data maps are often used to create density maps, called heat maps, or surface maps.
Let's map some polygon data from the maps package.
First, load the library and the data in it.
library(maps) # what data are in the package? data(package="maps") # more info ?map_data
and take a look at it
states <- map_data("state") # See ?map_data for details
head(states)
## long lat group order region subregion ## 1 -87.46201 30.38968 1 1 alabama <NA> ## 2 -87.48493 30.37249 1 2 alabama <NA> ## 3 -87.52503 30.37249 1 3 alabama <NA> ## 4 -87.53076 30.33239 1 4 alabama <NA> ## 5 -87.57087 30.32665 1 5 alabama <NA> ## 6 -87.58806 30.32665 1 6 alabama <NA>
What geom_* type would you use?
ggplot(states, aes(x=long, y=lat)) + geom_polygon()
ggplot(states, aes(x=long, y=lat)) + geom_polygon(aes(group=group))
ggplot(states, aes(x=long, y=lat)) + geom_polygon(aes(group=group), color="red", fill="blue")
Create some plots of the states
change the fill and outline colors
set an alpha value for the fill opacity
usmap <- ggplot(states, aes(x=long, y=lat)) + geom_polygon(aes(group=group), color="black", fill="white")
usmap
The world isn't flat.
When you plot geographic coordinates in 2D you get distortion.
A map projectionis a mathematical model that transforms geo coordinates to 2D.
This process introduces distortion.
Try these commands
?coord_map
usmap # Use cartesian coordinates
usmap + coord_map() # With mercator projection
usmap + coord_map("orthographic")
usmap + coord_map("polyconic")
Map areas by data values
Also called thematic maps or data maps
Sometimes called heat maps
ggplot(states, aes(x=long, y=lat)) + geom_polygon(aes(group=group, fill=region), color="white")
Add a coord_map()
Set the theme to void
ggplot(states, aes(x=long, y=lat)) +
geom_polygon(aes(group=group, fill=region), color="white") +
coord_map("polyconic") +
theme_void()
The state data includes a matrix called state.x77 that has key variables for each state. Let's take a look.
data(state) state_att <- state.x77 # rename it head(state_att) head(states)
unique(states$region)
unique(row.names(state_att)) # Alaska/DC not in both data sets, but it's okay. why?
# tolower state names
#convert to a data frame
state_att <- as.data.frame(state_att[-which(row.names(state_att)=="Alaska"),])
# not necessary. why?
state_att$name <- tolower(row.names(state_att)) # why is this necessary? states_dat <- merge(states, state_att, by.x="region", by.y="name") head(state_dat)
ggplot(states_dat, aes(x=long, y=lat)) +
geom_polygon(aes(group=group, fill=Population), colour="black") +
coord_map("polyconic")
Take a look at the ?scale_fill_distiller help page. Try changing some of the options, eg add *palette="Reds" inside scale_fill_distiller().
ggplot(states_dat, aes(x=long, y=lat)) +
geom_polygon(aes(group=group, fill=Population), colour="NA") +
scale_fill_distiller(palette = "Reds") +
coord_map("mercator") +
labs(title = "Population by State") +
theme_nothing(legend=TRUE) +
guides(fill = guide_legend(reverse = TRUE))
Display the population data using ggmap on top of a basemap of the continental USA
?ggsave